Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books

Identifieur interne : 000125 ( France/Analysis ); précédent : 000124; suivant : 000126

Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books

Auteurs : Antoine Doucet [France] ; Gabriella Kazai [États-Unis] ; Bodin Dresevic [Serbie] ; Aleksandar Uzelac [Serbie] ; Bogdan Radakovic [Serbie] ; Nikola Todic [Serbie]

Source :

RBID : Hal:hal-01070398

Abstract

This paper describes the setup of the Book Structure Ex- traction competition run at ICDAR 2009. The goal of the competition was to evaluate and compare automatic techniques for deriving struc- ture information from digitized books, which could then be used to aid navigation inside the books. More speci cally, the task that participants faced was to construct hyperlinked tables of contents for a collection of 1,000 digitized books. This paper describes the setup of the competition and its challenges. It introduces and discusses the book collection used in the task, the collaborative construction of the ground truth, the eval- uation measures and the evaluation results. The paper also introduces a data set to be used freely for research evaluation purposes.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

Hal:hal-01070398

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books</title>
<author>
<name sortKey="Doucet, Antoine" sort="Doucet, Antoine" uniqKey="Doucet A" first="Antoine" last="Doucet">Antoine Doucet</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-388300" status="VALID">
<orgName>Equipe Hultech - Laboratoire GREYC - UMR6072</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-150" type="direct"></relation>
<relation name="UMR6072" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300358" type="indirect"></relation>
<relation active="#struct-300266" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-150" type="direct">
<org type="laboratory" xml:id="struct-150" status="VALID">
<orgName>Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen</orgName>
<orgName type="acronym">GREYC</orgName>
<desc>
<address>
<addrLine>Boulevard du Maréchal Juin - 14050 CAEN Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.greyc.fr</ref>
</desc>
<listRelation>
<relation name="UMR6072" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300358" type="direct"></relation>
<relation active="#struct-300266" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR6072" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300358" type="indirect">
<org type="institution" xml:id="struct-300358" status="VALID">
<orgName>Ecole Nationale Supérieure d'Ingénieurs de Caen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300266" type="indirect">
<org type="institution" xml:id="struct-300266" status="INCOMING">
<orgName>Université de Caen Basse-Normandie</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Caen</settlement>
<region type="region" nuts="2">Basse-Normandie</region>
</placeName>
<orgName type="university">Université de Caen Basse-Normandie</orgName>
</affiliation>
</author>
<author>
<name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-28609" status="VALID">
<orgName>Microsoft Research [Redmond]</orgName>
<desc>
<address>
<addrLine>One Microsoft Way, Redmond, WA 98052, USA</addrLine>
<country key="US"></country>
</address>
<ref type="url">http://research.microsoft.com/</ref>
</desc>
<listRelation>
<relation active="#struct-379481" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-379481" type="direct">
<org type="institution" xml:id="struct-379481" status="VALID">
<orgName>Microsoft Corporation [Redmond, Wash.]</orgName>
<desc>
<address>
<country key="US"></country>
</address>
<ref type="url">https://www.microsoft.com/fr-fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
<author>
<name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
<author>
<name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
<author>
<name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01070398</idno>
<idno type="halId">hal-01070398</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-01070398</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-01070398</idno>
<date when="2011">2011</date>
<idno type="wicri:Area/Hal/Corpus">000107</idno>
<idno type="wicri:Area/Hal/Curation">000107</idno>
<idno type="wicri:Area/Hal/Checkpoint">000097</idno>
<idno type="wicri:Area/Main/Merge">000589</idno>
<idno type="wicri:Area/Main/Curation">000583</idno>
<idno type="wicri:Area/Main/Exploration">000583</idno>
<idno type="wicri:Area/France/Extraction">000125</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books</title>
<author>
<name sortKey="Doucet, Antoine" sort="Doucet, Antoine" uniqKey="Doucet A" first="Antoine" last="Doucet">Antoine Doucet</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-388300" status="VALID">
<orgName>Equipe Hultech - Laboratoire GREYC - UMR6072</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-150" type="direct"></relation>
<relation name="UMR6072" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300358" type="indirect"></relation>
<relation active="#struct-300266" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-150" type="direct">
<org type="laboratory" xml:id="struct-150" status="VALID">
<orgName>Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen</orgName>
<orgName type="acronym">GREYC</orgName>
<desc>
<address>
<addrLine>Boulevard du Maréchal Juin - 14050 CAEN Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.greyc.fr</ref>
</desc>
<listRelation>
<relation name="UMR6072" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300358" type="direct"></relation>
<relation active="#struct-300266" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR6072" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300358" type="indirect">
<org type="institution" xml:id="struct-300358" status="VALID">
<orgName>Ecole Nationale Supérieure d'Ingénieurs de Caen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300266" type="indirect">
<org type="institution" xml:id="struct-300266" status="INCOMING">
<orgName>Université de Caen Basse-Normandie</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Caen</settlement>
<region type="region" nuts="2">Basse-Normandie</region>
</placeName>
<orgName type="university">Université de Caen Basse-Normandie</orgName>
</affiliation>
</author>
<author>
<name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-28609" status="VALID">
<orgName>Microsoft Research [Redmond]</orgName>
<desc>
<address>
<addrLine>One Microsoft Way, Redmond, WA 98052, USA</addrLine>
<country key="US"></country>
</address>
<ref type="url">http://research.microsoft.com/</ref>
</desc>
<listRelation>
<relation active="#struct-379481" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-379481" type="direct">
<org type="institution" xml:id="struct-379481" status="VALID">
<orgName>Microsoft Corporation [Redmond, Wash.]</orgName>
<desc>
<address>
<country key="US"></country>
</address>
<ref type="url">https://www.microsoft.com/fr-fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
<author>
<name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
<author>
<name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
<author>
<name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267919" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="RS"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-364048" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-364048" type="direct">
<org type="institution" xml:id="struct-364048" status="INCOMING">
<orgName>Microsoft Development Center Serbia</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Serbie</country>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper describes the setup of the Book Structure Ex- traction competition run at ICDAR 2009. The goal of the competition was to evaluate and compare automatic techniques for deriving struc- ture information from digitized books, which could then be used to aid navigation inside the books. More speci cally, the task that participants faced was to construct hyperlinked tables of contents for a collection of 1,000 digitized books. This paper describes the setup of the competition and its challenges. It introduces and discusses the book collection used in the task, the collaborative construction of the ground truth, the eval- uation measures and the evaluation results. The paper also introduces a data set to be used freely for research evaluation purposes.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>Serbie</li>
<li>États-Unis</li>
</country>
<region>
<li>Basse-Normandie</li>
</region>
<settlement>
<li>Caen</li>
</settlement>
<orgName>
<li>Université de Caen Basse-Normandie</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Basse-Normandie">
<name sortKey="Doucet, Antoine" sort="Doucet, Antoine" uniqKey="Doucet A" first="Antoine" last="Doucet">Antoine Doucet</name>
</region>
</country>
<country name="États-Unis">
<noRegion>
<name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
</noRegion>
</country>
<country name="Serbie">
<noRegion>
<name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
</noRegion>
<name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
<name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
<name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000125 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000125 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     Hal:hal-01070398
   |texte=   Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024